25 research outputs found

    Determining and comparing protein function in Bacterial genome sequences

    Get PDF

    FunGeneClusterS:Predicting fungal gene clusters from genome and transcriptome data

    Get PDF
    Introduction: Secondary metabolites of fungi are receiving an increasing amount of interest due to their prolific bioactivities and the fact that fungal biosynthesis of secondary metabolites often occurs from co-regulated and co-located gene clusters. This makes the gene clusters attractive for synthetic biology and industrial biotechnology applications. We have previously published a method for accurate prediction of clusters from genome and transcriptome data, which could also suggest cross-chemistry, however, this method was limited both in the number of parameters which could be adjusted as well as in user-friendliness. Furthermore, sensitivity to the transcriptome data required manual curation of the predictions. In the present work, we have aimed at improving these features. Results: FunGeneClusterS is an improved implementation of our previous method with a graphical user interface for off- and on-line use. The new method adds options to adjust the size of the gene cluster(s) being sought as well as an option for the algorithm to be flexible with genes in the cluster which may not seem to be co-regulated with the remainder of the cluster. We have benchmarked the method using data from the well-studied Aspergillus nidulans and found that the method is an improvement over the previous one. In particular, it makes it possible to predict clusters with more than 10 genes more accurately, and allows identification of co-regulated gene clusters irrespective of the function of the genes. It also greatly reduces the need for manual curation of the prediction results. We furthermore applied the method to transcriptome data from A. niger. Using the identified best set of parameters, we were able to identify clusters for 31 out of 76 previously predicted secondary metabolite synthases/synthetases. Furthermore, we identified additional putative secondary metabolite gene clusters. In total, we predicted 432 co-transcribed gene clusters in A. niger (spanning 1.323 genes, 12% of the genome). Some of these had functions related to primary metabolism, e.g. we have identified a cluster for biosynthesis of biotin, as well as several for degradation of aromatic compounds. The data identifies that suggests that larger parts of the fungal genome than previously anticipated operates as gene clusters. This includes both primary and secondary metabolism as well as other cellular maintenance functions. Conclusion: We have developed FunGeneClusterS in a graphical implementation and made the method capable of adjustments to different datasets and target clusters. The method is versatile in that it can predict co-regulated clusters not limited to secondary metabolism. Our analysis of data has shown not only the validity of the method, but also strongly suggests that large parts of fungal primary metabolism and cellular functions are both co-regulated and co-located

    Bayesian prediction of bacterial growth temperature range based on genome sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments.</p> <p>Results</p> <p>This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using naĂŻve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions.</p> <p>Conclusions</p> <p>This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naĂŻve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.</p
    corecore